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Abstract — The Little-Hopfield network is an auto-associative 
computational model of neural memory storage and retrieval. 
This model is known to robustly store collections of randomly 
generated binary patterns as stable-points of the network dy- 
namics. However, the number of binary memories so storable 
scales linearly in the number of neurons, and it has been a long- 
standing open problem whether robust exponential storage of 
binary patterns was possible in such a network memory model. 
In this note, we design elementary families of Little-Hopfield 
networks that solve this problem affirmatively. 

I. Introduction 

Inspired by early work of McCulloch-Pitts [1] and Hebb 
0, the Little-Hopfield model |3], JD is a distributed 
neural network architecture for binary memory storage and 
denoising. In |4|, Hopfield showed experimentally, using the 
outer-product learning rule (OPR), that .15n binary patterns 
(generated uniformly at random) can be robustly stored in such 
an n-node network if some fixed percentage of errors in a 
recovered pattern were tolerated. Later, it was verified that 
this number was a good approximation to the actual theoretical 
answer [5|. However, pattern storage without errors in recovery 
using OPR is provably limited to n/(41ogn) patterns (6), Q. 
Since then, improved methods to fit Little-Hopfield networks 
more optimally have been developed |8), (9), iflOl . with the 
most recent being [11]. Independent of the method, however, 
arguments of Cover [12] can be used to show that the number 
of (randomly generated) patterns storable in a Little-Hopfield 
network with n neurons is at most 2n, although the exact value 
is not known (it is rj 1.6n from experiments in JTT|). 

Nonetheless, theoretical and experimental evidence suggest 
that Little-Hopfield networks usually have exponentially many 
stable-states (i.e., fixed-points of the dynamics). For instance, 
choosing weights for the model randomly (from a normal 
distribution) produces an n-node network with w 1.22" fixed- 
points asymptotically |fl3l . fl4l . fl5l . However, a stored 
pattern corrupted by only a few bit errors does not typically 
converge under the network dynamics to the original. 

To make precise mathematically the notion of large error 
tolerance, we say that a sequence B n of binary pattern col- 
lections is robustly stored by n-node Little-Hopfield networks 
if a pattern in B n having an of its bits altered at random 
can be recovered (with probability limiting to 1 as n — » oo) 
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Fig. 1 . Illustration of the energy landscape of a Little-Hopfield network 
depicting the robust storage of all 4-cliques in graphs on v = 8 vertices. The 
network dynamics sends a graph that is almost a clique to a graph with smaller 
energy, until finally converging to the underlying 4-clique as a stable-point. 

by converging the network dynamics, where < a < 1 
is a constant independent of n. In this sense, the randomly 
generated networks as discussed above do not have robust 
storage because the number of bits of corruption tolerated in 
memory recovery does not increase with the number of nodes. 

Another limitation of random networks is that stable-states 
are difficult to determine from the network parameters. In [ 16 1, 
a Little-Hopfield network with identical weights was shown to 
have exponential storage on 2n nodes, the stored collection 
consisting of binary vectors with exactly half of their bits 
equal. Thus, it is possible to design a network with a prescribed 
exponential number ( 2 ^) « -7= of patterns. However, such 
a network is not able to denoise a single bit of corruption. In 
particular, this collection of memories is not stored robustly. 

Very recently, more sophisticated (non-binary) discrete net- 
works have been developed ifTTl . lfl"8l that give exponential 
memory storage. However, the storage in these networks is 
not known to be robust. Moreover, determining or prescrib- 
ing the network parameters for storing these exponentially 
many memories is non-trivial (the ideas involve expander 
codes/graphs and solving linear equations over the integers). 

In this note, we design Little-Hopfield networks that ro- 
bustly store an exponential number of binary patterns. More- 
over, our construction is elementary. Two concepts of dis- 
crete mathematics are significant players in our development: 
cliques in graphs and groups of permutations. We review this 
technical material in Section HO Full statements of our results 
appear in Section [TIT] with proofs outlined in Section [V] So me 
preliminary applications are also presented in SectionjlV] 

II. Technical Background 

A. Permutation groups 

In abstract algebra, a group is a set G with a multiplication 
(or product) a o b between elements a,b e G satisfying the 
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following three assumptions. We have (i) associativity of the 
product: (a o b) o c = a o (6 o c) for all a,b,c G G; (ii) a 
multiplicative identity: there is a unique element 1 6 G with 
aol = loa = a for all a € G; and (iii) existence of inverses: 
for all a G G, there exists a -1 e G with aoa -1 = a _1 oa = 1. 

Groups are basic but fundamental objects in mathematics. 
For instance, the set of positive real numbers R>o forms a 
group under multiplication. The set of integers Z also forms a 
group, but with addition as the group product (and with as 
the identity element). An important family of non-commutative 
groups are the n X n invertible matrices GL n with entries in 
the reals K (the product being ordinary matrix multiplication). 

Fix a positive integer v. The set of bijections from the 
integers V = {l,...,v} to themselves are called the per- 
mutations S v of V. The set of permutations S v has size v\ = 
v-(v— 1) • • • 1 and forms a group with composition of functions 
as the product. Sometimes permutations are displayed with two 
rows that indicate the bijection. For instance, the permutation 
ct G Sg mapping the numbers (1,2,3,4,5) bijectively to 
(2, 1, 3, 4, 5) and its inverse c -1 can be represented: 



3 4 5 
5 3 4 



2 3 
1 4 



(1) 

We remark that S v can be identified naturally as a subgroup 
of GL V ; it is the set of v x v permutation matrices in GL V . 

Permutation groups appear frequently in mathematics and 
its applications. One notable early example is the development 
of Galois theory which uses the theory of S5 to deduce 
the Abel-Ruffini Theorem. This result says that the general 
fifth degree equation does not have closed-form solutions 
(e.g., there is no complex number x expressible "in terms of 
radicals" solving a; 5 + 2x + 1 = 0). In contrast, equations up 
to degree four are known to have such explicit solutions. 



B. Little -Hopfield networks 

Mathematically, a Little-Hopfield network H — (J, 9) on n 
nodes (e.g. neurons) {1, ...,n} consists of a real symmetric 
weight matrix 3 — J T G W ixn with zero diagonal and 
a threshold vecto^ 9 e R n . The possible states of the 
network are all length n binary strings {0, 1}™, which we 
represent as binary column vectors x = (xi, . . . ,x n ) T , each 
Xi 6 {0, 1} indicating the state xi of node i. Given any state 
x, one (asynchronous) update of the dynamics on x consists 
of replacing each Xi in x (in consecutive order starting with 
i = l) with the value 
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Here, 3t is the ith column of J and H is the Heaviside function 
given by H(r) = 1 if r > and H(r) = if r < 0. (See 
Fig. 1 in ifm for a detailed examination of a small network). 
The energy E x of a binary pattern x in a Little-Hopfield 



'Throughout this work, vectors such as 6 = (8\, . . . , 8 n ) will always be 
represented as columns, where M T for a matrix M denotes its transpose. 
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identical to the energy function for an Ising spin glass 
probabilistic model from statistical physics lfl9l . In fact, the 
dynamics of Little-Hopfield networks can be interpreted as 
0-temperature Gibbs sampling of this energy function. 

A fundamental property of Little-Hopfield networks, ob- 
served by Hopfield in (4), is that asynchronous dynamical 
updates |2]i do not increase the energy Q. In particular, one 
can show that after a finite number of updates, any initial state 
x converges to a fixed-point (also called stable-point or stored 
memory) x* of the dynamics; that is, x* = H(Jjx* —9{) for 
each i = 1, . . . , n. Given a binary pattern x, we say more 
strongly that it is a strict local minimum if every x' with 
exactly one bit different from x has a strictly larger energy: 



0>E 1C -E x , = (jjx-e i )5 i 



(4) 



where Si = 1 — 2xi and xi is the bit that differs between x 
and x'. It is straightforward to verify that if x is a strict local 
minimum, then it is a fixed-point of the dynamics. 

A permutation a € S n of the rt-nodes of a Little- 
Hopfield network H = (3,9) gives rise to another network 
ol-i = (0-3,0-9), where erj is the matrix obtained from J by 
permuting both its rows and columns by a, and where a9 is 
9 also permuted by a. 
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Fig. 2. Permutations acting on graphs. A simple graph on v = 5 vertices is 
encoded as a binary vector of length n = ( 2 ) = 10. Applying the permutation 
ct in |T} to the vertices V = {1, 2, 3, 4, 5} induces a permutation in S n of the 
vector encoding the graph, which we also denote a for notational simplicity. 



C. Graphs 

A simple graph on v vertices V = {1, ... ,v} is represented 
by a set E of (unordered) pairs of vertices, called the edges 
of the graph. We shall identify graphs on v vertices as binary 
vectors x of length n = Q) = t ^" 9 ~ 1 - > . A coordinate x e of x 
is indexed by an edge e = {i,j} (i < j), and is one or zero 
depending on whether e is contained in the edges of the graph 
or not (respectively). For simplicity, we list the coordinates in 
x lexicographically (i.e., the dictionary order). For 3 < k < v, 
define a k-clique to be a graph on v vertices that has edges 
between each pair of a set of k vertices, but no other edges. 
There are (JQ = v ( v ^ 1 )"^ v ^ k + 1 ) g ra phs on v vertices that are 
fc-cliques. The complete graph K v on v vertices is a w-clique. 
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Relabeling the vertices of a graph is the same as applying a 
permutation a € S v to them. This, in turn, induces a relabeling 
or permutation of the edges of the graph, which is realized as 
a permutation of the vector x representing it; Fig. [2] contains 
an example. Note that any permutation of the vertices V for 
a fc-clique gives rise to another fc-clique. 

The storage networks we propose are Little-Hopfield net- 
works with states identified as simple graphs on v vertices. In 
this case, entries of weight matrices J g are indexed 

lexicographically by pairs of edges e, / in K v . A permutation 
a on the vertices V — {1, . . . ,v} induces a permutation of 
the edges of a graph, defining a new weight matrix erj, which 
is the rows and columns of J permuted accordingly. 

III. Main results 

Recall the notion of robustness with parameter a g (0, 1) 
from the introduction. The following is our first main result. 

Theorem 1: For integers v — 2k, there is a family of Little- 
Hopfield networks on n = Qj} nodes that robustly store (with 
parameter a = 1/2) all fc-cliques in graphs on 2fc vertices, 
giving a total number of robustly stored memories on n nodes: 

\k J n Y l^^pK 

Another interpretation of Theorem [T] is that these n-node 
networks have large numbers of patterns (on the order of 
2«/2 as n oo) that converge under the dynamics to a 
stored binary memory. In other words, the networks have 
"large basins of attraction" around these stored cliques. For 
a graphical depiction of one such network, see Fig. [T] 

Theorem [T] says that we may store all cliques of a certain 
fixed size in a Little-Hopfield network. A natural question is 
whether a range of cliques are so storable as fixed-points of a 
single network. Our next result answers this question. 

Theorem 2: For each integer v = 2k, there is a Little- 
Hopfield network on n — (^) nodes that stores all 
2"(1 - e- Cv ) ^-cliques in the range < I < ^ffc as 

strict local minima for constants C « .002 and D « 13.928. 
Moreover, this range stores the most cliques. 

We close this section by sketching the main ideas in our 
proofs. We first show that there is a Little-Hopfield weight 
matrix storing all fc-cliques in some range if and only if there 
is one which has a simple 3-parameter structure. Note that the 
set of all J storing a given set of binary patterns as strict local 
minima is the interior of a (possibly empty) convex polyhedron 
(a finite intersection of closed half-spaces in Euclidean space). 

Also, as discussed in Section [TTJ the symmetric group S v 
acts on weight matrices J. Consider now the average of J 
over the group of permutations: 

J*(x,y,z):=-y2<jJ. (5) 

The matrix J* in |5]l is invariant under the action of S v ; that 
is, we have 

tJ* = - V tctJ = J*, 

<r£S„ 



Complete signal recovery under noise 
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Bit corruption as fraction of signal 

Fig. 3. One update of clean-up dynamics exhibits robustness with a = ^. 

We demonstrate that the exponential number of stored cliques in our networks 
have large basins of attraction. For each vertex size v = 2fc = 50, 75, 100, 
we constructed a Little-Hopfield network storing all fc-cliques as fixed-points 
of the dynamics. Each such fc-clique is represented as a binary vector of length 
fc(2fc — 1). We then corrupted 200 (chosen uniformly at random) fc-cliques 
by changing a fixed percentage of their bits at random and ran the network 
dynamics on each for one update step (i.e., a pass through all neurons once). 
The plot shows the percentage of the 200 cliques that were correctly recovered 
(exactly) as a function of the percent of the pattern that was corrupted. For 
example, a network with v = 100 vertices robustly stores (^q ) ~ 10 29 
memories (i.e., all 50-cliques in a 100-node graph) using binary vectors of 
length 4950, each having ( 5 2 °) = 1225 nonzero coordinates. In this case, 
the figure shows that a 50-clique memory represented with 4950 bits may be 
recovered by the dynamics after flipping 2475 of these bits at random. 



since the function from S v to itself mapping a n- to (for any 
fixed r € S v ) is a bijectionj^] 

It is straightforward to check that acting by such a permu- 
tation on a Little-Hopfield network that stores all fc-cliques as 
strict local minima will preserve that property. And since the 
set of all such networks is convex, the convex combination 
J* in {5]l stores all fc-cliques as strict local minima if J does. 
One now observes that J* has only 3 free parameters, and 
the remainder of the argument consists of optimizing these 
parameters to determine networks that store ranges of cliques. 

We remark that "averaging over the group," as is done 
in (|5j, occurs frequently in mathematics. For instance, it 
features prominently in Hilbert's work on invariant theory 
in algebra, the construction of Haar measures in functional 
analysis, and in representation theory, more generally. We 
defer mathematical proof of robustness to future work, but 
see Fig. [3] for its experimental verification. 



2 An injective function from a finite set to itself is bijective. Thus, we only 
need to verify injectivity (i.e., r<xi = t<T2 implies a\ = <T2). But if tu\ = 
tct 2 , then (7 \ = t~ 1 tiji = t~ 1 tit2 = ""2 so that the map is injective. 
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IV. Applications 

Applications to neuroscience. The Little-Hopfield network 
is a model of emergent neural computation Q, |4), [20|. One 
interpretation of the locaj^] dynamics in such a model is that 
by minimizing an energy, the network tries to determine the 
most probable memory conditioned on a noisy or corrupted 
version. This concept is in line with arguments of several 
researchers in theoretical neuroscience BTl . El . (23], [24|, 
Il25l . Il26l 123, EH, and can be traced back to Helmholtz 
[29 1 . In addition, recent analyses of spike distributions in 
neural populations have shown that their joint statistics can 
sometimes be well-described by the Ising model [30], PP . 
||32l , |[33l . The now demonstrated ability of these networks 
to store large numbers of patterns robustly suggests that the 
Little-Hopfield architecture should be studied more fully as a 
possible explanation of neural circuit computation. 

Applications to computer science. The networks described 
in this note have potential implications for several algorithmic 
problems at the intersection of discrete mathematics, proba- 
bility, computer science, and machine learning. For instance, 
a classical NP -complete problem is to determine large cliques 
in graphs, the so-called MAXCLIQUE problem. We have 
demonstrated here that when a clique is planted into a empty 
graph and then "hidden" by turning edges on and off at 
random, it is still possible to recover the original clique by 
converging the local dynamics of Little-Hopfield networks. 
See IT341 for the most recent results on this problem. 

Applications to coding theory. Our networks also gives rise 
to new approaches for constructing and working with binary 
codes. For instance, our networks are easily parallelizable and 
have similar robustness properties to the well-known optimal 
codes of Reed-Solomon 11351 . which use the mathematical 
machinery of polynomial rings over finite fields. 

V. Proofs of theoretical results 

Consider the complete graph K v on v vertices which has 
n = (2) edges, and fix k > 3. As discussed in Section |nj a 
binary vector x € {0,1}™ is identified with a graph G x on v 
vertices, where x e = 1 if edge e is present in the graph. Let 
Ck ■= {x e {0, 1}™ : G x is a fc-clique} denote the set of edge 
vectors representing fc-cliques. Identifjj^] each Little-Hopfield 
network with its symmetric weight matrix J. Consider the 3- 
p a rameter family of symmetric matrices J G K™ x ": 

f x if |en/| = l 

Jef = \ y if |en/| = o 

{ z if e = /, 

for some x,y, z 6 R, where |e n f\ is the number of vertices 
that the edges e and / share. 

Let Hk denote the set of Little-Hopfield networks J which 
store all fc-cliques Ck as strict local minima. We claim that 
there exists a network J storing all fc-cliques if and only 

3 The term "local" here refers to the fact that an update J2J to a neuron only 
requires the feedforward inputs from its neighbors. 

4 For expositional simplicity and without loss of generality, we move the 
threshold vector 9 into the diagonal of the weight matrix since the energy 
{3} is unchanged by sending parameters (3,9) <— > (J + 2diag(6>), 0), where 
dlag(#) is the diagonal matrix with 9 along the diagonal. 



m = 5, M = 15 
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Fig. 4. Feasible region for network parameters giving exponential 
storage. The shaded region is the feasible polygon for network parameters 
giving clique storage for the range 5 < k < 15. Black points are its vertices, 
and the red, blue, and green lines are the linear constraints. 

if there exists a Little-Hopfield network in the 3-parameter 
family above storing all fc-cliques. Also, let H? denote the 
central cone, which is the set of all matrices constructed by 
averaging as in |5]) elements of Hk- 

Proposition 1: The polyhedral cone Hk is non-empty if and 
only if its central cone H^ is non-empty. Moreover, Hf is 
non-empty, and J(x,y,z) 6 if and only if its parameters 
(x,y,z) give the following vector all positive entries: 

4(fc-2) (fe — 2)(fc — 3) -2 \ / x \ 
-2(fc-l) -(fc-l)(fc-2) 2 y . (6) 

-fc(fc-l) 2 / \ z J 

Proof: The cone Hk is closed under the action of per- 
muting the labels of the vertices of the complete graph; that 
is, Hk is an orbitope in the sense of Sanyal, Sottile and 
Sturmfels [36]. The 3-parameter family of matrices above is 
precisely the set of symmetric matrices invariant under this 
action. This proves the first claim. The second follows by 
direct computation. ■ 
Theorem 3 (Range storage): Fix m, M such that 3 < m < 
M < v. The set f\k =m H k of Little-Hopfield Qj) -node 
networks J storing all fc-cliques for m < k < M is non- 
empty if and only if (m, M) solve the implicit equation 
xm — x m < 0, where 

_ -(4m - Vl2m 2 - 52m + 57 - 7) 
Xm ~ 2(m 2 -m-2) ' 

_ -{AM + V12M 2 - 52M + 57 - 7) 
XM ~ 2{M 2 -M-2) ■ 

In particular, a solution (m, M) is independent of v. 

Proof: By Proposition [T| the intersection Hfelm ^ fe ^ s 
non-empty if and only if the intersection of their central 
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cones f)%L m H% is non-empty. For J(x,y,z) € f)%L m Hf, 
its parameters (x, y, z) need to satisfy the system of linear 
equations |6]i for all m < k < M. Solving this system gives 
the above constraints on m and M. ■ 

Note that the intersection of f] k — m %f w i m me pl ane 
z = —0.5 is a polygon in E 2 . We display this polygon in Fig. [4] 
for (m,M) = (5, 15). Each k adds a triple of red, blue, and 
green lines, corresponding to the three linear constraints in ([3}. 
Note that the green constraints and all but two red constraints 
are inactive. Vertices of this polygon are the intersections of 
pairs of blue lines with parameters k, k + 1, and the two most 
extreme red lines with parameters k — m and k — M. 

For large m, we have xm — x m < when M < 2+ ^| m ss 
13.9282m, and it is straightforward to translate this into 
the statement of Theorem [2] from Section III using basic 
facts about limiting binomial distributions and their normal 
approximation. 
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